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Abstract 



In long-term deployments of sensor networks, monitoring the quality of gathered data is a critical issue. Over the time of deploy- 
ment, sensors are exposed to harsh conditions, causing some of them to fail or to deliver less accurate data. If such a degradation 
remains undetected, the usefulness of a sensor network can be greatly reduced. We present an approach that learns spatio-temporal 
^\ correlations between different sensors, and makes use of the learned model to detect misbehaving sensors by using distributed com- 
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putation and only local communication between nodes. We introduce SODESN, a distributed recurrent neural network architecture, 
and a learning method to train SODESN for fault detection in a distributed scenario. Our approach is evaluated using data from 
different types of sensors and is able to work well even with less-than-perfect link qualities and more than 50% of failed nodes. 
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Wireless sensor networks (WSN) are increasingly being de- 
ployed over extended periods of time fS), in particular for en- 
vironmental monitoring applications. To facilitate long term 
deployments in remote areas, nodes are typically powered by 
solar energy and rechargeable batteries. Consequently, much 
of the research has focussed on energy-aware design of hard- 
and software as well as on building models of energy supply 
and demand. The continuing progress in this area has lead to 
longer autonomy of WSN, but also revealed that deploying a 
sensor network over a long period of time requires automatic 
monitoring of the quality of gathered data and of the condi- 
tion of solar panels, sensors and batteries. With information 
about the performance of these components, maintenance trips 
to remote monitoring sites can be better planned or possibly 
avoided, leading to a reduction of management costs. Some of 
the faults might be easier to detect than others: when some of 
the expected data is missing, fault seem obvious to recognize. 
Even in this simple case, an automatic notification relieves the 
administrator from continuously monitoring a database. When 
the network delivers data as expected, there might also be more 
subtle problems, like mis-calibration or build-up of dust on sen- 
sors and solar panels, leading to incorrect sensor readings or 
shorter duty-cycles and thus less data. To prevent this, sensor 
networks have to become more user-friendly: existing systems 
often require to manually detect and diagnose potential prob- 
lems. First steps towards higher reliability and user- friendliness 
are automatically building a model of the normal system behav- 
ior and to use this model to detect anomahes. With the result of 
this process, it is possible to notify administrators who then can 
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decide on appropriate actions. Consequently, the system can 
run unobserved with less danger of losing important data. 

For this work, we are interested in detecting problems that 
manifest in changes of sensor readings for some of the nodes of 
an entire network as a result of a sensor fault. Typically, some of 
the sensors at different nodes are correlated over space or time. 
We present an approach that is able to learn spatio-temporal 
correlations and make use of them for detecting anomalies in 
a decentrahzed way, without using global communication dur- 
ing normal operation. Instead, sensor nodes participate in a 
large, distributed recurrent neural network, where each of the 
sensor nodes hosts only a few neural units and communicates 
only with its local neighbor sensor nodes. Our neural network 
approach is inspired by echo state networks (ESN) |i5J, a re- 
current neural network approach which has shown to be suc- 
cessful in learning even complex time series. ESN have already 
been applied in anomaly detection in sensor networks |8J, but 
only in a way that requires one instance of an ESN on each 
node. This results in an unnecessary consumption of memory 
resources and processing power. A straightforward distribution 
of an ESN over the entire sensor network is also not a solu- 
tion, because it requires all of the nodes to communicate with 
each other. More often than not, this sort of communication is 
neither available nor desired in sensor networks. 

To address the problem of detecting sensor faults in WSN in 
a distributed way, we introduce spatially organized distributed 
echo state networks (SODESN), an architecture that allows for 
distributing a single recurrent neural network over an entire sen- 
sor network even when the WSN imposes a local communica- 
tion structure on its connectivity matrix (Sect.|3]l. In Sect.|4j we 
present a training method for SODESN and an approach to train 
SODESN for fauh detection in WSN. SODESN learn a model 
of normal behavior of sensor nodes based on information from 
other sensors. The fault detection in turn monitors differences 
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between the model and actual sensor readings in a distributed 
way. We demonstrate the capabilities of our approach with data 
from different temperature and radiation sensors (Sect. |5]l and 
discuss our results in Sect. [6] In the following section, we start 
with a brief overview of related work, and give a short review 
of the ESN approach, the starting point for our work. 



2. Background 

Detecting and diagnosing faults is a challenge that has been 
addressed in many different areas for different purposes. Logic- 
based approaches, for instance, can be applied if a complete de- 
scription of the desired behavior of the system is available (see 
e.g. 1 10 1). In distributed systems, approaches like in |1| detect 
faults by using connections between processors to implement 
a voting based diagnosis system. WSN are distributed systems 
where different components, from batteries over sensors to pro- 
cessors, contribute to potentially many different types of faults. 
There may be problems with the energy supply, with the rout- 
ing or other communication problems, resulting in missing data 
from single nodes, or causing the whole system to deliver no 
data at all. In long-term deployments, problems like degrada- 
tion of hardware can result in inaccurate measurements, caused 
by dust and continued exposure of sensors to the environment. 
Some of the existing work tackles the problem of automatically 
detecting node failures with centralized approaches (e.g. |TT]), 
where relevant information is forwarded to a dedicated man- 
ager performing the fault detection. Methods to detect faults in 
a distributed way have been investigated, because global com- 
munication becomes prohibitive with increasing network sizes. 
The approach in fT\ is an example of such a decentralized ap- 
proach, where sensor faults are detected based on differences in 
the readings between neighbors. It uses only local communi- 
cation between nodes, but assumes that all sensors measure the 
same variable. Likewise, |9| is able to detect faults with a dis- 
tributed approach, but here, the assumptions are not as strong. 
Neighbor sensors are not required to measure the same variable, 
but are assumed to be correlated as long as they are working 
normally, and uncorrected as soon as they are faulty. This fault 
detection method uses a graph-based approach to isolate faulty 
nodes in the network, where correlation between the time series 
over a time window is used to identify faults. 

In our work, we are also interested in detecting sensor faults 
in a distributed way. Instead of explicitly basing our fault de- 
tection on spatial correlations between sensors, we want our 
system to detect the relevant spatio-temporal correlations on its 
own. If we are able to distribute a large recurrent neural net- 
work over the entire sensor network, each sensor node can es- 
timate its own true values based on information from its neigh- 
bors in a training period. Because recurrent neural networks 
model dynamical systems (i.e. with a memory of past events), 
correlations can be both temporal as well as spatial. Using the 
estimated true values, and a threshold on deviation between es- 
timated value and recent readings, each node can decide if it 
can be assumed to work correctly. 



2.1. ESN technical background 

Recurrent neural networks have only recently become more 
widely used in practice, because many approaches have been 
difficult to set up and to train for. An ESN is a specific type of 
recurrent neural network which is able to successfully predict 
complex time series |5|. At the same time, the complexity of 
training an ESN is much lower than with traditional recurrent 
neural networks. Like any other neural network, ESN consist 
of neural units and synaptic connections between these units. A 
neural network is recurrent if there is at least one cycle in these 
connections. Units are typically organized in different layers 
and possess a state (called "activation"). This activation is com- 
puted (using a typically non-linear "activation function") based 
on inputs from incoming connections. Connections between 
units perform a linear transformation and can be either exci- 
tatory (positive connection weights) or inhibitory (in case of 
negative connection weights). Traditional approaches to train- 
ing recurrent neural networks, like backpropagation through 
time |il2|, change all of the weights between different units. 
The lower training complexity of ESN is a result of using a 
fixed, randomly connected "reservoir" of neural units in the re- 
current layer, and only changing connections to output units 
during training (see Fig.[T]i. Once the training is finished, con- 
nections are changed no longer Both output and the next state 
of the network are determined by the current state of the net- 
work and the current input. 

Recurrent Layer 
Input units Output units 




adaptable weights 

" random weights 

Figure 1 : Echo State Network. 

To make the approach work, however, connections cannot 
be entirely random, but need to fulfill the so-called echo state 
condition |4|. For an illustration of this condition [71, consider 
a time-discrete recursive function fx,+; - F(fx,,fu,) that is de- 
fined at least on a compact sub-area of the vector-space fx e R", 
with n the number of internal units. The fx, are to be interpreted 
as internal states and tu, is some external input sequence. Now, 
assume an infinite input sequence: tu = fMo, fui, . . . and two 
random initial internal states of the system fxo and fy^. To both 
initial states fxo and fy^ the sequences fx = fxojfjci, . . . and 
fy = fjo, fji , . . . can be assigned. 

fx,+i = F(tx,Jui) (1) 
fy,^j = Fify„fu,) (2) 
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The system F{-) fulfills the echo state condition if it is inde- 
pendent from the set tUt, and if for any (fxcfjo) and all real 
values e > 0, there exists a 6(e) for which d(fx,,fy,) < e for all 
t > 6ie), where d is a square Euclidean metric. Two rules are 
a helpful for creating a connectivity matrix W with this condi- 
tion: 

CI it is a necessary condition that the spectral radius of the 
biggest eigenvalue of W is below one. 

C2 it is a sufficient condition that the biggest singular value of 
W is smaller than one 

Using one ESN for each sensor node, or one ESN in a cen- 
tral location, would require a combination of high memory re- 
sources on each node, an explicit selection of correlated sensors 
or global communication. Instead, we describe a new approach 
where we distribute a recurrent neural network over an entire 
sensor network, fulfill the above mentioned echo state condi- 
tion, and use only communication between neighbor nodes. 

With sensor networks and recurrent neural networks, two dif- 
ferent kinds of networks play a role in the following. In order 
to avoid confusion between the two in our description, we use 
node when we talk of sensor network nodes, whereas we use 
unit for the components of a neural network. In our notation 
we use bold capital letters for matrices, bold small letters for 
vectors or vector-sized functions, and itaUcs for scalars. 

3. Spatially organized distributed echo state networks 

To distribute a recurrent neural network over a WSN, connec- 
tions between units have to be restricted to the spatial neighbor- 
hood of sensor nodes in order to avoid unrestricted global com- 
munication. We also would Uke to retain the efficient learning 
of ESN. Therefore, we create neural units on each sensor node, 
and follow the original idea of ESN in that all connections be- 
tween internal units are randomly initialized and fixed. Con- 
necting units only to spatial neighbors on different devices leads 
to our idea of spatially organized distributed echo state net- 
works (SODESN), where the underlying communication struc- 
ture of the sensor network prohibits to use arbitrary synaptic 
connections between distributed units. More specifically, we 
allow hidden units to be connected to each other only if they 
are hosted on the same or on a neighbor network node. More- 
over, neural inputs are only connected to units on the same sen- 
sor node in order to further reduce conmiunication. Instead of 
globally cormected output units, we use local output units on 
each sensor node. Output units get their input from the local 
part of the reservoir and from reservoirs on neighbor nodes. 

ESN typically use a sparsely connected reservoir, so that dif- 
ferent intemal units develop different dynamics. Outputs are 
then calculated as a linear combination of the (non-linear) in- 
temal units. Using only local connections in SODESN almost 
automatically leads to a sparse connection matrix, albeit with 
a different distribution of connections. From a global perspec- 
tive, regarding a SODESN as a single neural network, we also 
want to make sure the system fulfills the echo state condition 
mentioned in the previous section. 



In a setup with M sensor nodes, each node m hosts input 
units, Nm hidden units, and output units. The total number 
of neural units thus is 

M M 

K = ^^K„ inputs, N =^^N„ hidden units, and 

m=l m=l 

M 

L = ^^Lm output units. 

m=l 

Then, from a global perspective, the SODESN model con- 
sists of K input units with an activation vector 

u(n) = ( ui^(n),. . .,UKi(n) , ui^jn), . . .,UK^{n) )' , (3) 

node 1 node M 

of A'^ hidden units with an activation vector 

x(n) = (xi,(n), ...,XA,,(n), Xi„(n), XAr„(n))', (4) 
and of L output units with an activation vector 

y(«) = (yii(«), -^jLiW, yiM^n), (5) 

For the rest of this paper, we assume all neural units to be 
evenly distributed over all sensor nodes, i.e. each node contains 
the same number of units. 

For theoretical considerations, it is convenient to represent 
synaptic connections weights between units in several global 
matrices, which have to be distributed in a practical implemen- 
tation. Connections between hidden units are represented in a 
N X N matrix W = (wij), connections from input units to hid- 
den units inaN X K matrix w'" - (w'.") , and connections from 
input and hidden units to output units in a L X (A" + A^) matrix 

The activation of intemal units is computed as 

x(n -I- 1) = f(W"u(n -I- 1) + Wx(n)), (6) 

where u(n + 1) represents the readings from all sensors, and f 
the vector of activation functions / of all internal units. We use 
/ = tanh as activation function in each internal unit, and Unear 
input- and output units (/ = 1). In some cases, ESN use con- 
nections projecting back from outputs into the reservoir This 
is also possible in SODESN and requires an additional matrix 
^back Consequently, the activation of intemal units x(n -i- 1) 
is then computed as f(W'"u(n + 1) + Wx(n) + W'"'*y(n)). For 
our application, we do not make use of these connections. 

3.1. Proxy units 

In a practical implementation, activation vectors are dis- 
tributed over multiple sensor nodes. Moreover, there are con- 
nections between units on different sensor nodes, which require 
to have a specified physical location. We store incoming con- 
nections from units hosted on neighbor sensor nodes on the lo- 
cal node. Units with outgoing connections to units on other 
devices just forward their activations with no changes to the 
neighbor device. Additional proxy units on the neighbor act as 
a place holder for remote units and take activations from con- 
nected units. From proxy units, there are only local connections 
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Figure 2: Neural units in a sensor node and connections to units on neighbor nodes. 



to the reservoir or to output units. Proxy units also eliminate the 
need for all sensor nodes being synchronized as long as they all 
use the same interval to process data (e.g. once every minute or 
every 15 minutes). After new activations have been computed, 
their values are forwarded to connected proxy units where they 
can be used by the neighbor device. Once their values have 
been used, proxy units are reset to 0. This is to avoid using old 
values in case of a link failure between two network nodes. In 
our experiments described in Sect. [5j we used link qualities of 
from 10% to 100%. 

3.2. Initializing an untrained SODESN 

To set up the untrained SODESN, we construct the desired 
number of units on each sensor node. We create local inter- 
nal connection matrices Wj with a specified density, and scale 
each of them so that the spectral radius is smaller than one. In 
addition, we create sparse random connections between inter- 
nal units on neighbor devices, represented by connection from 
proxy units for incoming connections, and references to sensor 
nodes and respective proxy units for outgoing connections. Lo- 
cal input connection matrices W'" with random weights fully 
connect input units to all local internal units on the node (with 
one input unit for each local sensor). For output units, we cre- 
ate local random matrices W""' to provide them with input from 
input units, proxy units and internal units. 

The local internal connection matrices are scaled by their 
largest eigenvalue so that each spectral radius is at most one. 
For the entire connection matrix composed of all local matri- 
ces, this procedure does not in general lead to a spectral radius 
of smaller than one yet, but it leads to similar conditions for 
the internal units on each sensor node. After all local matrices 
are created in this way, the resulting global connection matrix 
is scaled to meet the echo state condition. 

Algorithm [T] generates a distributed SODESN, where each 
sensor node hosts some input units, hidden units and output 
units. Globally, the sensor network imposes a specific structure 



on the random reservoir connectivity matrix. Figure [3] illus- 
trates the diff'erence in connectivity between a standard ESN 
and a SODESN. 

Algorithm 1: Initialization: on each node j ... 

1 Generate Kj input units, Nj internal units, and Lj output 
units 

2 Generate Mj - 2,- A^, proxy units for all neighbor sensor 
nodes / as place holders for the internal units on neighbor 
nodes 

3 For each neighbor sensor node /, create A^^ pointers to 
proxy units on node / 

4 Generate a sparse, random matrix Wy for connections 
between local internal units 

5 Find Aj as the largest eigenvalue of W; 

6 Scale W; by l/max(Aj,l) 

7 Choose X e {0, 1), a connection density between 
neighbor units 

8 Generate random connections from x x Mj of the local 
proxy units to local internal units 

9 Generate and initialize an all zero Lj x (Mj + Nj + Kj) 
matrix for connections to local output units from all other 
local units. 



4. A training algorithm for SODESN 

After initial setup, SODESN needs to be trained. We describe 
an approach to offline training a SODESN in a supervised fash- 
ion, i.e. we need time series of both input and output units as 
training data. Once the training is firushed, no further adapta- 
tion is made. For our application to diagnose problems in sensor 
readings, we train output units to predict readings of a sensor in 
a neighbor node. In this case, the training data can be derived 
from any input time series of "normal" sensor readings. 
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Algorithm 2: Offline training SODESN 



Figure 3 : Reservoir connectivity of a standard ESN (left) with 400 internal units 
and a SODESN (right), with 400 internal units distributed over a 5 X 8 grid of 
sensor nodes. 



4.1. Offline training SODESN 

For a first description of the training algorithm, we regard 
SODESN as one recurrent neural network with specific con- 
nectivity - from there, a distribution of the algorithm over all 
sensor nodes is straightforward. Unfortunately, the standard 
training approach for ESN (see |4J for a detailed description) 
cannot be applied, because it assumes that output units can be 
connected to any of the input or the hidden units. In SODESN, 
we want to connect output units only to local input, internal or 
proxy units. 

Training is executed in two steps, in a similar way to train- 
ing ESN: as a first step, we sample a matrix M of internal net- 
work states, and a matrix T of output activations. Samples are 
taken while feeding a training data time series into input units 
(when using connections projecting back from output units into 
the reservoir, a teacher time series has also to be fed to output 
units). For each time step of the training data, we collect a vec- 
tor of internal activations and a vector of output activations from 
our SODESN. The sampled vectors are stored in new rows of 
M and T. With the total number of hidden units, L the num- 
ber of output units, and S the number of training steps, the final 
sizes of M and T are 5 x and S x L, respectively. The first 
samples of a training are typically discarded in order to wash 
out the initial network state. 

As a second step, we compute the output weights w°"' to let 
the training time series d(n) for each output unit /' approximate 
a linear combination of the internal activations x(n). "Approxi- 
mate" means to minimize the mean squared error on the train- 
ing signal, which, in the case of ESN, can be achieved by mul- 
tiplying the pseudoinverse of M with T: (W""')' = M^'T. In 
SODESN, however, this operation is not possible, because it 
will create connections from all internal units to all the output 
units. A solution to the problem is to adapt the output weights 
locally, by using local connection matrices and for each 
sensor node j. Mj contains only activations of local input, in- 
ternal and proxy units, while contains output activations of 
the local output units (see Algorithm |2]l. For each node, we 
compute a local output connection matrix: 



(7) 



Input: u(n), d(«), n - Q...T,Tq <T 

Initialize the network state x(0) = 

// Sample network state for training series 

Initialize M = 0, T = 



3 for n - 0...T do 



x(n + 1) = f(W'"u(n + Wx(n)) 
// Discard initial states 
if n >= To then 

Add X as a new row to M 



Add tanh d(n) as a new row to T 



end 



9 end 

// compute sample matrices for each node 

10 foreach sensor node j do 

11 Initialize Mj = 0, T^' = 

12 foreach column x' in M do 

13 if x' are the activations of an internal unit on the 
same or on a neighbor node then 

14 I Add x' as a new column to Mj 

15 end 

16 end 

17 foreach column y' in T do 

18 if y' are the activations of an output unit on the 
same or on a neighbor node then 

19 I Add y' as a new column to Ty 

20 end 

21 end 

// Compute all output weights for node j 
// using the pseudoinverse of My 



22 

23 end 



')' ^Mt'T,- 



An additional advantage of this operation, at least in theory, 
is that it can be performed on each sensor node in parallel. In 



many practical cases, however, the amount of desired train- 
ing data and the complexity of the operation will exceed the 
available memory and limited processing power of small sen- 
sor nodes. This is not a severe restriction, though, because the 
training needs to be done only once and can be executed on 
a remote machine. The result of the training, a set of output 
weights, has then to be sent back to all nodes and installed in 
the local connection matrices. 

4.2. Training SODESN to detect sensor faults 

With the supervised training approach described above, we 
need to provide input as well as output signals for each sensor 
node. In our application to detect sensor faults, we expect the 
input signal and output signal for a sensor to be the same when 
the sensor works normally. To gather training data, the sensor 
network has to be deployed and collect sensor readings for a pe- 
riod of time. During this period, we assume there are no sensor 
faults, so that the training output for each sensor is exactly the 
same as the input time series. 

Using only normal data for training results in the learning to 
pick up this correlation. The output weights will be adjusted 
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Sample weather data from 9-May-2006 to 15-May-2006 



dry bulb temperature 
soil temperature 1cm 

— soil temperature 5cm 
soil temperature 10cm 

— soil temperature 20cm 
- ' soil temperature 50cm 

— radiation (scaled by 1/15) 
■ - infrared (scaled by 1/15) 




192 288 384 480 
Readings (96 points = 1 day) 



576 



672 



(a) One week of the sensor data used in our experiments. In tliis grapli, time series 
of infrared and radiation liave been scaled by yj . 
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(b) Arrows between sensor nodes indicate the local SODESN information 
exchange for fault detection. 



Figure 4: Training data and setup of the simulated sensor network. 



SO that input and output always match closely. When a sensor 
is faulty and delivers unexpected values to its input unit, the 
respective output will be similar to the input rather than an es- 
timate of the true value. In such a case, we cannot distinguish 
between normal or faulty sensors, so that our prediction is use- 
less. 

To fix the approach so that the prediction of the true value of 
one sensor is independent on its actual value, a possible solution 
is to not connect this sensor to the neural network during both 
training and exploitation. The prediction is then solely based 
on inputs from other sensors. This is, however, only possible 
if we are interested in monitoring just very few sensors in the 
network. To monitor all of the sensors, this would require to 
disconnect all of the sensors from the neural network. With no 
remaining inputs, we cannot make any predictions, so that this 
approach is not an option. 

A more promising attempt is therefore to make only the train- 
ing of one output unit independent of the respective input unit. 
This can be achieved by training one output unit at a time, and 
disconnecting the input unit we are trying to predict during the 
training. However, this approach leads to a further problem: the 
prediction will be based on the assumption that there is no input 
from the sensor in question. During normal operation the input 
signal of the sensor will be added to some of the internal units 
and lead to a change in the output. In our experiments we found 
the influence of the incoming signal large enough to make the 
prediction useless. 

Instead of just disconnecting individual input units during 
training of their respective output units, we make sure there is 
an actual signal from all of the inputs. For the input of the sen- 
sor we are currently training, the signal should be uncorrected 
to the true sensor value. This can be achieved by for example 
replacing the input by a white noise signal. The correct signal 
is used as teacher output, and the goal of the training is to learn 
the correlation between the true local sensor value and the value 
of neighbor sensors. 

As mentioned above, the training aims to minimize the mean 



square error on the training signal. In all our experiments, we 
tested the capability of the SODESN to generalize for new data 
by computing the normalized root mean square error (NRMSE) 
of the predictions on an independent test set. The NRMSE of n 
predictions p of the SODESN against the test data t is defined 
by 



NRMSE = 



n var(f) 

where var(f) is the variance of the test data. 



(8) 



4.3. Distributed fault detection 

Using SODESN for fault detection involves making predic- 
tions on each sensor node. It requires also to set a threshold for 
sensor readings to be considered abnormal. Possible methods 
for defining thresholds can be based on measuring deviations 
from the predicted value of a sensor (for example a deviation 
exceeding the maximum deviation of predictions on the test 
set), or on the NRMSE between prediction of the sensor value 
and its actual reading for a specified time window. 

In the previous section, we set up the training so that predic- 
tions of a sensor are independent of its current value. By using 
random noise as local input during training, we base the fault 
detection of each sensor on input from the rest of the network. 
If sensors fail only rarely, only a few of them will feed faulty 
values into the SODESN at the same time. If there is a faulty 
sensor, it will continue to feed incorrect readings into the net- 
work until the problem is fixed. This will affect fault detection 
in the remaining sensors, even more so if more than one sensor 
is faulty at the same time. 

In systems with a high likelihood of simultaneous sensor fail- 
ures, it might therefore be a good idea to prevent faulty sensors 
from feeding their readings into the SODESN. For the same 
reasons we used random noise as input during training in the 
previous section (as opposed to no input), we expect that simply 
disconnecting faulty sensors does not improve the predictions: 
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NRMSE of 10cm soil temp, prediction 
NRMSE of dry bulb temp, prediction 
average NRMSE over al 




Points of training data 



(a) Learning curves for predicting tlie 10cm soil 
temperature, air temperature, and an average learn- 
ing curve over all sensors. Experiments were run 
with a 90% WSN connectivity and training set sizes 
of up to 30.000 data points. The graph shows results 
of sets of up to 18.000 points. 




- NRMSE of 1 0cm soil temp, prediction 

- NRMSE of dry bulb temp, prediction 

- average NRMSE over all 8 sensors 



Internal units per node 

(b) Influence of the number of internal units per node 
on the learning performance. On average (blue line), 
an increasing number of internal units decreases the 
NRMSE only slightly. Experiments were run using 
a training set size of 30.000 and 90% success rate in 
communication between nodes. 





- " - ESN: NRMSE of 1 0cm soil temp, prediction 




■ ESN: NRMSE of air temp, prediction 




- " - ESN: average NRMSE over all 8 sensors 




■■■»■■ SODESN: average NRMSE over all 8 sensors 























Simulated link quality between sensor nodes in % 

(c) The benchmarlc against a centralized approach 
using one ESN for each prediction shows that the 
SODESN is able to maintain a high prediction qual- 
ity even with poor link qualities. Only under ideal 
conditions can the centralized approach keep up with 
SODESN. 



Figure 5: Results of various experiments using SODESN and a benchmark using ESN. 



after all, output units in other nodes used a fraction of their in- 
put for training. In order to decrease their effect on the system, 
we do flag and disconnect faulty sensors from the SODESN. 
Instead of using no input from faulty sensors at all, we replace 
their input with the predictions of their readings as computed by 
the SODESN. We expect this helping to maintain a high predic- 
tion quality for the remaining sensors with a larger number of 
faults in the system. 

5. Experiments and results 

We evaluated our approach in simulations where we used 
data from a local weather station with several sensors mea- 
suring temperatures, radiation, infrared, etc. (the automatic 
weather station of the Department of Physical Geography of 
Macquarie University 0). The simulated setup consisted of 8 
sensor nodes arranged in a 2 by 4 grid where each node has one 
of the sensors and can communicate with its nearest neighbors 



(see Fig. 4(b) i. The sensors we used measured the air tempera- 



ture, soil temperatures at 1cm, 5cm, 10cm, 20cm, and 50cm re- 
spectively, radiation, and infrared. The data we used was taken 



in 15 minute intervals. Figure 4(a) shows data of our sensors for 



one week. In the graph it is visible that the different time series 
are at least weakly correlated to each other In a setup with all 
sensors measuring the same variable at slightly different loca- 
tions, correlations would be expected to be even stronger. 

5.7. Experimental setup 

Experiment 1 — amount of training data. A number of param- 
eters play a role in training and using SODESN, such as the 
amount of training data, the number of units on each node, con- 
nectivity between units, link qualities between nodes, etc. In a 
first experiment, we used 15 internal units on each node with to- 
tally approximately 10% connectivity between nodes. We used 
a spectral radius of 0.66 for the connectivity matrix, a link qual- 
ity of 90% during both training and testing, and an increasing 
amount of training data to obtain learning curves using an incre- 
mental 10-fold cross validation. The training data varied from 



300 data points, corresponding to slightly more than 3 days 
worth of data, up to 30.000 data points, i.e. data from a period 
of 10 month. The test data set had a size of 16.665 data points 
in all cases. For each individual experiment, a new SODESN 
was generated. 

Experiment 2 — reservoir size. To evaluate if and how much an 
increasing number of internal units contributes to higher predic- 
tion quality, we varied the number of internal units per sensor 
node from 3 to 39 units, resulting in SODESN with 24 up to 3 12 
internal units. We used a training data size of 30.000 points for 
training and 16.665 data points for testing in a 10-fold cross val- 
idation. The basic procedure and all other parameters remained 
unchanged from the first experiment. 

Experiment 3 — ESN vs. SODESN. To compare SODESN 
against a baseline, we simulated a fault detection with global 
communication using one (centralized) ESN for each sensor in 
the network. The ESN we used had 120 internal units each, 
equivalent to a SODESN with 15 units on each of our 8 nodes, 
and simulated link qualities from 10% to 100% during both 
training and testing. In the centralized setting, these link quali- 
ties represent the quality of the link from sensor to central node 
(independent of the number of hops). 

In contrast to using SODESN, in a setup with one ESN for 
each sensor it is possible to use input data from only 7 sensors, 
predicting an 8th sensor of our sensor network. 

Experiment 4 — robustness. Sensors in our first experiments 
deliver time series from different (yet correlated) phenomena, 
such as temperatures at different depths and radiation. To test 
SODESN with closer correlated inputs, we computed different 
time series based on the air temperature data by randomly shift- 
ing the original series up to ±30 minutes in time and adding 
uniform random noise of up to ±10%. (a) A series of tests to 
determine the prediction quality was run using 8 sensors in a 
2x4 grid, (b) Then, we extended the size of the sensor network 
to 100 nodes, arranged in a 10 x 10 grid. Using these 100 nodes. 
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Measurement (one measurement every 1 5 mins) 




1 1 (jry bulb temp. 

— soil temp. 1cm 
I ■ soil temp. 5cm 

— 'Soil temp. 10cm 
■ - soil temp. 20cm 

— soil temp. 20cm 



Figure 6: Example of a prediction of the current soil temperature at 5cm (the solid blue line in the top graph, the true value is shown as a dotted red line). The 
prediction is based on inputs to other sensor nodes (bottom graph). The dotted red line in the bottom graph also shows the soil temperature at 5cm for comparison 
(not used in the prediction). Additional inputs were radiation and infrared (omitted for clarity). 




400 500 600 

Measurement (one data point every 1 5 mins) 



Figure 7: Example of a prediction of the current air temperature as solid blue line, and the true value is shown as a dotted red line. As in Fig. [6] the prediction is 
based on inputs to other sensor nodes. 



we simulated multiple sensor faults to test the effect on the pre- 
diction quality for other sensors. To this end, we randomly se- 
lected an increasing number of sensors to fail. Instead of the 
true value, faulty sensors constantly returned zero and fed this 
value into the SODESN. (c) Finally, we tested the effect of mul- 
tiple sensor faults, where faulty sensors were stopped feeding 
their values into the neural network. As discussed above, the 
predictions of their true values as computed by the SODESN 
were used instead. 

5.2. Results 



Experiment 1. Figure 5(a) gives an impression of the NRMSE 
we obtained dependent on the amount of training data used. 
Results are shown for two of the sensors, and an overall aver- 
age NRMSE for all 8 sensors. With an increasing amount of 



training data, prediction of our SODESN becomes more reli- 
able, after an initial oscillating phase of 3000 data points. Ta- 
ble [T] shows NRMSE and some absolute maximum errors of 
predictions on test data. In particular for smaller training sets, 
absolute errors of the more dynamic time series, such as the 
air temperature, can become quite large for a short period even 
though prediction and true value are close over longer intervals. 
In this case, the NRMSE between predicted readings and actual 
values over a window of time might be a more reliable fault in- 
dicator. For less dynamic time series, such as the different soil 
temperatures, both NRMSE and absolute errors are small and 
may be used to indicate faults. 

Figure [6] shows the result of a continuous prediction of the 
soil temperature at 5cm depth, while the sensor for this variable 
fed just random noise into the SODESN during the whole pe- 
riod (slightly more than 10 days). Similarly, the graph in Fig.|7] 
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Table 1 : NRMSE and some maximum absolute errors for varying training set 
sizes. 

NRMSE (max. abs. en'ors in brackets in °C) 
training air soil temp. soil 

set size temp. 5cm 20cm radiat. 

0.14 (1.9) 2.64 3.40 
0.04 (1.0) 0.46 0.97 
0.04 (0.8) 0.19 0.79 



1500 6.76 (155.4) 
15000 1.11 (26.6) 
30000 0.56 (14.2) 



is a plot of the prediction of the air temperature during the same 
period, again while replacing the true temperature measurement 
by random noise in the input to the SODESN. 

Experiment 2. On average, an increasing number of internal 
units in the reservoir of our SODESN did not significantly im- 
prove the prediction quality. Figure 5(b)| is a plot of the NRMSE 
for several reservoir sizes from 3 to 39 units per node. It can be 
seen from both the plot and from Table [2] that the prediction of 
air temperature seems to benefit from an increased number of 
units. In other cases, using more internal units does not lead to 
smaller errors, and in some, the error increased even slightly. 
The average NRMSE over all sensors for SODESN with 312 
internal units is only slightly lower than the average NRMSE 
for SODESN with only 24 internal units. 

Experiment 3. With a decreasing link quality, the accuracy of 
the centralized approach using one ESN for each predicted 
sensor decreases rapidly (see Table |3]l. In contrast to that, 
SODESN can maintain the same level of accuracy even with 



poor link qualities between local nodes. The graph in Fig. 5(c) 
shows that the ESN can achieve results close to SODESN only 
under almost perfect conditions. This seems surprising at first, 
but the difference in performance is a result of the different 
methods to pass on sensory information: In a centralized ap- 
proach, loss of data has much bigger impact on the result be- 
cause the missing information is not replicated elsewhere. In 
our distributed approach, data is broadcasted to several neigh- 
bors (2 or 3 neighbors in our 8 node experiment, up to 4 neigh- 
bors in our experiments with 100 nodes). Because in our exper- 
iments links between nodes fail independently, the information 
lost as a result of one link failing may still be present in the 
network and can be used for prediction. 

Experiment 4. (a) Using 8 more closely correlated air temper- 
ature time series, we achieved an almost constant NRMSE of 
0.2 for SODESN independent of the number of units (from 3 



Table 2: Some NRMSE and maximum absolute errors for different reservoir 
sizes from 3 to 39 units per node. 



NRMSE, and max. 


abs. errors in 


°C(in 


brackets) 










soil temperature 






units air teinperature 


5cm 




20cm 


radiation 


3 0.67 


(18.2) 


0.07 


(1.2) 


0.12 


0.74 


15 0.51 


(15.4) 


0.04 


(0.8) 


0.21 


0.76 


27 0.48 


(12.0) 


0.06 


(1.2) 


0.15 


0.73 


39 0.47 


(11.6) 


0.09 


(2.1) 


0.15 


0.73 



units/node up to 39 units/node), and a maximum absolute pre- 
diction difference of 6°C. The lowest NRMSE in experiment 2, 
where we used soil temperatures and radiation data to predict 
air temperature, was 0.47 (Table [2]l. The better performance 
in this scenario was expected, (b) Scaling the experiment up 
to 100 sensor nodes, the prediction has about the same qual- 
ity as with only 8 sensors. Then, we begin to subsequently fail 
random sensors. A first qualitative (visual) inspection of the 
predicted time series vs. the true values shows acceptable per- 



formance up to more than 60% of failed sensors (see Fig. 8(b) 
for a sample prediction with 60 failed sensors). More quanti- 



tatively, from the graph in Fig. 8(a) we see that failing up to 



16 of the sensors does not change the performance of the sys- 
tem at all. In our experiments, the average maximum absolute 
error for up to 16 failed nodes was below 11 °C, and for up to 
32 failed nodes, it remained below 16°C. For 60 failed nodes, 
the NRMSE has grown from 0.26 to about 1 .0, with an maxi- 
mum absolute error of around 19°C. (c) Feeding back the pre- 
dictions of the true value instead of faulty sensor values results 
in a greatly improved prediction quality, so that the average er- 
ror lower is almost constant for up to 50% of failed nodes. Even 
for more than 50% failed nodes, the error increases only slowly 
until around 90%. 

6. Discussion 

Our first experiment showed that the amount of training data 
used strongly influences the prediction quality. Further aspects 
seem to be the "correlatedness" of different sensors and the 
dynamics of the time series. Some of the "easier" sensors in 
our experiment could be successfully modeled after training 
on 1500 data points (» 15 days of training data), while for 
"harder", less correlated sensors we needed at least 5000 points 
(^ 52 days). Our offline learning approach requires to perform a 
computation on the whole training time series. In particular for 
larger data sets this will usually be done on a machine outside 
the network. The learning then computes sets of output weights 
for each sensor node. A way to deal with less correlated sen- 
sors may therefore be to successively improve the SODESN by 
re-training on increasingly larger data sets and exchanging the 
learned weights over time. 

A second important factor is the amount of local communi- 
cation introduced. From the description of our architecture in 
Sect. [3] it follows that neighbors exchange activations of their 



Table 3: NRMSE in a centralized approach using one ESN with 120 internal 
units compared to NRMSE of SODESN under varying WSN link qualities from 
10 to 98%. 



NRMSE 


in ESN (E) and SODESN (S) 












soil temperature 






air temperature 


5cm 


20cm 


link% 


E 


S 


E 


S 


E S 


10 


1.41 


0.51 


1.72 


0.04 


1.83 0.19 


50 


0.89 


0.54 


1.00 


0.04 


1.04 0.23 


90 


0.63 


0.51 


0.50 


0.04 


0.56 0.22 


98 


0.55 


0.49 


0.27 


0.04 


0.32 0.17 
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- prediction with input from faulty sensors 
-faulty sensor input replaced by prediction 



Number of failed sensors 



(a) Performance of the system with increasing num- 
ber of failures, with and without replacing faulty sen- 
sors with their prediction. 
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-true air temperature 
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(b) Sample prediction for one sensor witli 60 out of (c) Sample prediction for one sensor, with 60 from 
100 sensors failed, and continuing to feed their values 100 sensors failed, and replacing their values by pre- 
into the SODESN. dictions. 



Figure 8: Results of various experiments using SODESN and a benchmark using ESN. 



local internal units - one value per unit and sample step. Re- 
sults from our second experiment are therefore interesting, be- 
cause we have seen that the number of internal units did not 
play a crucial role - we used only 3 units in some experiments. 
SODESN communication and sample rate of sensors does not 
have to run synchronously with each other. Alternatively it is 
also possible collect some data locally, and to run the SODESN 
on larger blocks of data, as long as all nodes run their part of 
the SODESN at the same rate (proxy units would have to be 
changed to queues in this case). 

The amount of local computation required is similarly depen- 
dent on the number of units. In contrast to the offline training, 
exploitation requires only a few operations, for each internal 
unit a number of additions, multiplications, and computation of 
tanh(x). 



7. Conclusions 

In this paper, we presented SODESN, a novel distributed re- 
current neural network architecture for creating models of dy- 
namical systems. We introduced an offline learning approach 
for SODESN that is closely related to training ESN and inherits 
the low computational complexity of the original approach. We 
then presented an approach to train SODESN for fault detection 
in WSN, where predictions of sensor values are made based on 
information from neighbor nodes. 

Our evaluations on real-world data show that our approach 
can be used to build models of dynamic time series and help 
to detect sensor faults. We have shown that the approach is 
robust to WSN link failures through its distributed computation 
and local communication. SODESN outperform a comparable, 
centralized approach assuming realistic link qualities. Using 
only local communication also contributes to SODESN scaling 
well with an increasing number of WSN nodes. 

We have also shown that our approach is robust against mul- 
tiple node failures. In our evaluation using the predictions of 
failed sensors as input, 50% of the sensors failed without affect- 
ing prediction quality, and the performance degraded gracefully 
up to slightly more than 80% failed nodes. 
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